Insert - aware Partitioning and Indexing Techniques For Skewed Database Workloads

نویسندگان

  • Eugene Wu
  • Samuel Madden
چکیده

Many data-intensive websites are characterized by a dataset that grows much faster than the rate that users access the data and possibly high insertion rates. In such systems, the growing size of the dataset leads to a larger overhead for maintaining and accessing indexes even while the query workload becomes increasingly skewed. Additionally, the database index update costs can be a non-trivial proportion of the overall system cost. Shinobi introduces a cost model that takes index update costs account, and proposes database design algorithms that optimally partition tables and drop indexes from partitions that are not queried often, and that maintain these partitions as workloads change. We show a 60x performance improvement over traditionally indexed tables using a real-world query workload derived from a traffic monitoring application and over 8 x improvement for a Wikipedia workload. Thesis Supervisor: Samuel Madden Title: Associate Professor of Electrical Engineering and Computer Science

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling transactional workloads on the cloud

In this paper, we address the problem of transparently scaling out transactional (OLTP) workloads on relational databases, to support database-as-a-service in cloud computing environment. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availabi...

متن کامل

Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing

The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. Hence, recent in-situ query processing systems operate directly over raw data, alleviating the loading cost. At the same time, analytical workloads have increasing number of ...

متن کامل

Improved Content Aware Image Retargeting Using Strip Partitioning

Based on rapid upsurge in the demand and usage of electronic media devices such as tablets, smart phones, laptops, personal computers, etc. and its different display specifications including the size and shapes, image retargeting became one of the key components of communication technology and internet. The existing techniques in image resizing cannot save the most valuable information of image...

متن کامل

Indexing Highly Dynamic Hierarchical Data

Maintaining and querying hierarchical data in a relational database system is an important task in many business applications. This task is especially challenging when considering dynamic use cases with a high rate of complex, possibly skewed structural updates. Labeling schemes are widely considered the indexing technique of choice for hierarchical data, and many different schemes have been pr...

متن کامل

GraphTwist: Fast Iterative Graph Computation with Two-tier Optimizations

Large-scale real-world graphs are known to have highly skewed vertex degree distribution and highly skewed edge weight distribution. Existing vertex-centric iterative graph computation models suffer from a number of serious problems: (1) poor performance of parallel execution due to inherent workload imbalance at vertex level; (2) inefficient CPU resource utilization due to short execution time...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010